Unsupervised Morphological Segmentation Based on Segment Predictability and Word Segments Alignment
نویسنده
چکیده
Word segments are relevant cues for the automatic acquisition of semantic relationships from morphologically related words. Indeed, morphemes are the smallest meaning-bearing units. We present an unsupervised method for the segmentation of words into sub-units devised for this objective. The system relies on segment predictability to discover a set of prefixes and suffixes and performs word segments alignment to detect morpheme boundaries.
منابع مشابه
Unsupervised Multiword Segmentation of Large Corpora using Prediction-Driven Decomposition of n-grams
We present a new, efficient unsupervised approach to the segmentation of corpora into multiword units. Our method involves initial decomposition of common n-grams into segments which maximize within-segment predictability of words, and then further refinement of these segments into a multiword lexicon. Evaluating in four large, distinct corpora, we show that this method creates segments which c...
متن کاملUnsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, m...
متن کاملUnsupervised Query Segmentation Using Monolingual Word Alignment Method
In this paper, we propose a novel unsupervised approach to query segmentation using the word alignment model which is usually adopted in statistical machine translation system. Query segmentation is to obtain complete phrases or concepts in a query by segmenting a sequence of query terms, which is an important query processing procedure for improving information retrieval performance in search ...
متن کاملLinguistically Motivated Unsupervised Segmentation for Machine Translation
In this paper we use statistical machine translation and morphology information from two different morphological analyzers to try to improve translation quality by linguistically motivated segmentation. The morphological analyzers we use are the unsupervised Morfessor morpheme segmentation and analyzer toolkit and the rule-based morphological analyzer T3. Our translations are done using the Mos...
متن کاملMixed-lingual spoken word recognition by using VQ codebook sequences of variable length segments
We are investigating unsupervised phone modeling. This paper describes a derivation method of VQ codebook sequences of variable length segments from spoken word samples, and also describes evaluation results by applying the method to mixed-lingual speech recognition tasks which include non-native speakers. The VQ codebook is generated based on a piecewise linear segmentation method which includ...
متن کامل